Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Clone group mapping method based on improved vector space model
CHEN Zhuo, ZHANG Liping, WANG Huan, ZHANG Jiujie, WANG Chunhui
Journal of Computer Applications    2016, 36 (7): 2031-2037.   DOI: 10.11772/j.issn.1001-9081.2016.07.2031
Abstract342)      PDF (1026KB)(314)       Save
Focusing on the less quantity and low efficiency problem of Type-3 clone code mapping method, a mapping method based on improved Vector Space Model (VSM) was proposed. Improved VSM was introduced into the clone code analysis to get an effective clone group mapping method for Type-1, Type-2 and Type-3. Firstly, clone group document was pretreated to get the code document with removing useless word, and the file name, function name and other features of clone group document were extracted at the same time. Secondly, word frequency vector space of clone group was extracted and built; the similarity of clone group was calculated by using cosine algorithm. Then mapping of clone group was constructed by clone group similarity and feature matching, and the result of cloning group mapping was obtained finally. Five pieces of open source software was tested and verified by experiments. The proposed method can guarantee the recall and the precision of not less than 96.1% and 97.1% at low time consumption. The experimental results show that the proposed method is feasible, which provides data support for the analysis of software evolution.
Reference | Related Articles | Metrics
Evolution pattern recognition and genealogy construction based on clone mapping of versions
ZHANG Jiujie, ZHAI Ye, WANG Chunhui, ZHANG Liping, LIU Dongsheng
Journal of Computer Applications    2016, 36 (7): 2021-2030.   DOI: 10.11772/j.issn.1001-9081.2016.07.2021
Abstract446)      PDF (1721KB)(352)       Save
To solve the problems that the method of building clone genealogy is complicated, as well as evolution patterns need urgently expanding, new clone evolution patterns were proposed, and clone genealogy was built automatically based on the mapping relationships of code clones between versions. First, topics of code clones were extracted using Latent Dirichlet Allocation (LDA) from clone detection results in each released software version. Second, mapping relationships of code clones between of versions were confirmed by similarities of the topics. Third, evolution patterns were appended to code clones according to the existing mapping relationships, and evolution features were analyzed. Finally, clone genealogy was built by integrating mapping relationships and evolution patterns together. Experiments of building clone genealogy was conducted on four open source systems. The experimental results show that the proposed approach is feasible, and the proposed evolution patterns really exist in the procedure of software evolution. Further more, it is found that about 90% of code clones in the software systems are stable during evolution, and approximately 67% of clone groups live through less than half of the release versions. The experimental conclusions and relevant analysis provide strongly support for the future research as well as maintenance and management of code clones.
Reference | Related Articles | Metrics
Clone code detection based on Levenshtein distance of token
ZHANG Jiujie, WANG Chunhui, ZHANG Liping, HOU Min, LIU Dongsheng
Journal of Computer Applications    2015, 35 (12): 3536-3543.   DOI: 10.11772/j.issn.1001-9081.2015.12.3536
Abstract1271)      PDF (1361KB)(465)       Save
Aiming at the problems of less clone code detection tools and low efficiency for the current Type-3, an effective clone code detection method for Type-3 based on the levenshtein distance of token was proposed. Type-1, Type-2 and Type-3 clone codes could be detected by the proposed method in an efficient way. Firstly, the source codes of a subject system were tokenized into some token sequences with specified code size. Secondly, each definite-sized substring of the token sequences was mapped with corresponding index. Thirdly, the clone pairs were built by the levenshtein distance algorithm and the clone groups were built by the disjoint-set algorithm on the basis of the mapping information query. Finally, the feedback information of clone codes were given. A prototype tool named FClones was implemented. It was evaluated by the code mutation-based framework and compared with two state-of-the-art tools SimCad and NiCad. The experimental results show that the recall of FCloens is equal to or greater than 95% and its precision is not lower than 98% in detecting all of these three types of clone codes. FClones can do better in detecting Type-3 clones than others.
Reference | Related Articles | Metrics